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ABSTRACT 

In this paper we extent the previously published DALI-approximation for likelihoods 
to cases in which the parameter dependency is in the covariance matrix. The approxi¬ 
mation recovers non-Gaussian likelihoods, and reduces to the Fisher matrix approach 
in the case of Gaussianity. It works with the minimal assumptions of having Gaussian 
errors on the data, and a covariance matrix that possesses a converging Taylor approx¬ 
imation. The resulting approximation works in cases of severe parameter degeneracies 
and in cases where the Fisher matrix is singular. It is at least 1000 times faster than 
a typical Monte Garlo Markov Ghain run over the same parameter space. Two exam¬ 
ple applications, to cases of extremely non-Gaussian likelihoods, are presented - one 
demonstrates how the method succeeds in reconstructing completely a ring-shaped 
likelihood. A public code is released here: DALI. 


1 INTRODUCTION 

Evaluating a multidimensional likelihood can be a compu¬ 
tationally costly procedure. If speed matters, often a good 
approximation of the likelihood is required. A widely used 
approximation of likelihoods is the Fisher matrix approxi¬ 
mation, which singles out the Gaussian part of a likelihood 
(Tegmark et al. 1997). Because many analytical results for 
Gaussians are available, such as the position of the 1-cr conh- 
dence contours and higher-order equivalents, the Fisher ma¬ 
trix approximation is numerically fast to evaluate. It has 
also become widely used as it allows for the easy computa¬ 
tion of Figures of Merit, simple determinants of the matrix 
elements and manipulations thereof, that can be used to 
evaluate the expected performance of an experiment, for ex¬ 
ample as introduduced to dark energy research by Albrecht 
et al. (2006). 

The alternatives to the Gaussian approximation are 
grid-evaluations of the likelihood, or sampling techniques 
such as Monte Garlo Markov Ghains (MGMG), Nested Sam¬ 
pling (Audren et al. 2013; Allison & Dunkley 2014; Skilling 
2004), and Population Monte Garlo (that uses iterative up¬ 
dates of a mixture model to capture non-Gaussianities (Kil- 
binger et al. 2010; Wraith et al. 2009)). These methods 
tackle the challenge of characterising non-Gaussian likeli¬ 
hoods by using sophisticated algorithms. Gram-Gharlier and 
Edgeworth-type expansions can also be used to capture non- 
Gaussianities, but suffer from regions in the parameter space 
where the approximated likelihood turns negative, thereby 
violating the Kolmogorov axioms for a probability (Gramer 
1946). 

Nonetheless, likelihood approximations are urgently 
needed throughout the physical sciences, whenever evaluat¬ 
ing a full likelihood is numerically too costly, e.g. when fore¬ 


casting parameter constraints of a future experiment, where 
many different conhgurations need to be simulated, see e.g. 
(Pillepich et al. 2012; Laureijs et al. 2011). A quick check 
of the resulting likelihood is also desirable when optimizing 
a data analysis pipeline, or when establishing novel observ¬ 
ables and testing how precisely they can constrain model 
parameters, see e.g. (Ghantavat et al. 2014). Non-Gaussian 
likelihood approximations, that maintain positive dehnite- 
ness and normalizability, whilst rivaling the Fisher matrix 
in manners of speed, have recently become a focus of re¬ 
search. Transformations of the likelihood to Gaussianity are 
one way of tackling this problem (Joachimi & Taylor 2011). 
Another approach named ‘DALE was presented in Sellentin 
et al. (2014) (henceforth named ’Paperl’), under the addi¬ 
tional constraint of the data being Gaussianly distributed 
and the covariance matrix being constant. The main results 
of Paper 1 were application independent, i.e. the presented 
approximation would work for all observables to which it 
would be specified. The appendix contained insights into 
how the non-Gaussian likelihood approximation could also 
be applied to cases where the covariance matrix depends on 
parameters - however, additional assumptions about the co- 
variance matrix needed to be made, which are fuhhlled only 
for specihc applications. 

In this paper, we extend the results of Paper 1 and 
present a non-Gaussian likelihood approximation that can 
deal with parameter-dependent covariance matrices. The 
main results will again be independent of the physical ap¬ 
plication, meaning the method can be applied in any held 
of physics, as well as in cosmology, or any other scientihc 
branch that compares a parameterized model to data. The 
method only demands that the data shah be again Gaus¬ 
sianly distributed. Therefore, a public code DALI is being 
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released along with this paper which allows the user to inter¬ 
face the DALI-formalism with their physical problems. The 
code also contains the results of Paper 1. A cosmological 
application to weak lensing will be presented in Sellentin & 
Schaefer 2015 (in prep.). 


2 GAUSSIANITY 

Throughout the paper, we assume a data set x with Gaus¬ 
sian errors, leading to the unapproximated likelihood 

(~U^ ~ - /x)) (1) 

V(27r)'*|C| V 2 J 

where p is a vector that holds p parameters. The mean of the 
data fi and the covariance matrix C are predicted by a pa¬ 
rameterized physical model and can in general both depend 
on the p parameters. These parameters shall be constrained 
by maximizing the likelihood using the data which is col¬ 
lected in the data vector x. The number of data points is 
d and |C| is the determinant of the covariance matrix. The 
covariance matrix is given by 

C{p) = {{x-n){x-Ef), (2) 

such that for a linear model, the parameters enter already 
quadratically in the covariance matrix. In general, the pa¬ 
rameter dependence of the covariance matrix will be de¬ 
termined by the estimator applied and often also include 
nuisance parameters (Taylor & Joachimi 2014). 

The corresponding log-likelihood C = — ln(L) of the 
Gaussian Eq. (1) is 

£= iTr[ln(C) + C-i{£>)] , (3) 

where we neglected the 27r factors of the normalization, and 
where D = {x — fj,){x — is the data matrix. Angular 
brackets denote averaging over the data. 

The numerical costs of evaluating this likelihood will 
increase with the number of data points, the complexity of 
calculating the model predictions fi and the estimation of 
the covariance matrix under variation of the parameters. In 
case of Bayesian inference, the likelihood could be updated 
to a posterior by multiplying with priors and normalizing 
by the corresponding evidence. 

The assumption of Gaussian errors is not a severe con¬ 
straint, since due to the central limit theorem, all data that 
stems from a distribution of finite variance, can be rebinned 
into a data set with Gaussian errors - if enough data points 
are available. However, having Gaussian errors in the data 
space does not mean that the resulting likelihood will be 
Gaussian in the parameter space. Therefore, the mathemat¬ 
ical tools available to exploit Gaussian likelihoods, such as 
their analytical marginalization over nuisance parameters, 
cannot be automatically exploited in the parameter space. 
The Gaussianity of the data set only transfers to the pa¬ 
rameter space if no parameter degeneracies occur and if the 
model that is compared to the data is linear in all parame¬ 
ters. Similarly, a Gaussian likelihood can also be expected if 
the data set is constraining enough, such that essentially a 
linear Taylor approximation of the model and the covariance 
matrix around the best fit point is sufficient. This explains 


why the Fisher matrix has become so popular in forecast¬ 
ing the performance of precision experiments, which were 
designed to tightly constrain targeted parameters. 

In contrast, achieving extremely constraining data with 
a new experiment cannot be expected by default if for ex¬ 
ample extensions to a standard model are to be investigated 
and new parameters measured for the very first time. If the 
forecasted data is not expected to be extremely constrain¬ 
ing, the likelihood will not be peaked so sharply around the 
best fit that a linear Taylor approximation of the model, and 
the covariance matrix alone may not be good enough. This 
already hints at why the following non-Gaussian likelihood 
approximation needs to build on higher order derivatives. 

The higher order likelihood approximation for a con¬ 
stant covariance matrix was derived in Paper 1. Here, we 
specialize to the case of the model dependence of the mean 
being identically zero, /r(p) = 0, and all parameter depen¬ 
dence is contained in the covariance matrix. This can be the 
case in a real scenario, where the mean is zero but fluctua¬ 
tions around that mean can be of different amplitudes, and 
this is encoded in the covariance. Examples are a measure¬ 
ment of pure noise, which clearly has mean zero, but where 
the covariance of the noise depends on parameters. Another 
example is any kind of mode decomposition, where again it 
is clear that a mode has mean zero. A cosmological exam¬ 
ple is the galaxy power spectrum, which arises from density 
fluctuations around the cosmic mean value, and where the 
mean overdensity must be zero, due to mass conservation. 
The power spectrum can then be used as the covariance in 
the following framework, where it is the covariance of the 
Fourier amplitudes of the overdensity field. ^ 


3 PROBLEMS WHEN APPROXIMATING 
LIKELIHOODS 

Approximating a likelihood is more complicated than ap¬ 
proximating a more general function because one typically 
wishes the likelihood to be positive semi-definite at all or¬ 
ders; otherwise negative probabilities occur, which are non¬ 
sensical. Positive semi-definiteness is a strong constraint and 
not automatically fulfilled by a usual Taylor series approxi¬ 
mation of the likelihood. For example, Taylor approximating 
a standard normal distribution yields, 

exp(—x^) = 1 — + O(x^). (4) 

If truncated at second order, this approximation becomes 
negative at 2-a from the best fit, or begins rising to in¬ 
finity at about 2-a when truncated at fourth order. This 
divergence makes the likelihood approximation not normal¬ 
izable, such that no measure for relative likelihoods can be 
defined. Both, second and fourth order approximation of 
the standard normal distribution therefore violate defining 
properties of a likelihood. Obviously, a continuation of the 


^ Often, however, such analyses are carried out by comparing a 
measured power spectrum to a parameterized power spectrum, 
which is then treated as the mean. In these cases the covariance 
matrix would then be the covariance of the power spectrum (a 
four-point function) instead of the powerspectrum. 
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Taylor approximation Eq. (4) to very high orders would rem¬ 
edy both of these issues but this would be a cumbersome 
approach. It is well known that Taylor approximating the 
log-likelihood instead, reconstructs the Gaussian likelihood 
much more quickly 

exp(—= exp(—£) = exp(—T(£)), (5) 

where T [C) denotes the Taylor series of the log-likelihood. If 
this Taylor series is evaluated at the maximum of the stan¬ 
dard normal distribution then already the first and second 
order terms of this series recovers the Gaussian likelihood 
completely, and all higher orders of the series are identically 
zero. The approximation schemes Eq. (4) and Eq. (5) are 
both mathematically valid ways of approximating the stan¬ 
dard normal distribution, even though they lead to entirely 
different Taylor series. The scheme outlined in Eq. (5) is 
however much more advantageous because it leads already 
at second order the desired approximation, and negative 
likelihoods then do not appear at all, since the exponential 
function is always positive. Therefore, the choice of which 
quantity shall be approximated influences decisively how 
quickly the approximation recovers the shape of the orig¬ 
inal function, and whether unwanted artifacts appear when 
truncating the approximation at low orders. 

The choice of Taylor approximating the log-likelihood, 
instead of the likelihood, to second order in multiple dimen¬ 
sions yields a Hessian matrix whose expectation value is the 
Fisher (or Information) matrix. Denoting partial derivatives 
by daf = I,a, the Fisher matrix of Eq. (1) can be written 
as 

FaP ={F.,aP ) Ip 

= ^Tt [Co"'C,c. Co~^C,p ] + /r,. Co"V,/3 

where the derivatives are evaluated at the maximum likeli¬ 
hood point p and summation over repeated indices is im¬ 
plied. All quantities that are to be evaluated at the maxi¬ 
mum likelihood point are marked by a subscript zero. Gon- 
sequently, Co is constant and cannot be derived with respect 
to parameters. 

The corresponding likelihood approximation is then 
given by 

L{x\p) RS N ■ exp{-^FapApaApp) (7) 

where the Apa = Pa — Pa are the offsets from the best fit 
point Pa and is a normalization constant. 

The Fisher approximation results in the usual ellip¬ 
soidal, multi-variate correlated Gaussian confidence con¬ 
tours, which often do not recover the shape of a non- 
Gaussian likelihood distribution well. A continuation of the 
Taylor series is then desirable in order to capture these non- 
Gaussianites. This wish for a continuation of the series is 
predicated on the requirement to solve the issue of normal- 
izability and positive-definiteness at all orders. Also, it is 
preferrable to recover the essential shape of the likelihood 
with as little additional terms as possible for computational 
efficiency. Glearly, just as there exist multiple ways in ap¬ 
proximating the likelihood Eq. (4), there will exist multiple 
ways of continuing the approximation from that given by 
the Fisher matrix. These extended approximations will pick 
up the desired information about the likelihood’s shape with 


different efficiencies. An obvious extension would be the con¬ 
tinuation of the log-likelihood’s Taylor-approximation 

L{x\p) RiAexp (^-^FapApaApp 

(8) 

-^Qap-ysApaApfiAp-fAps + 0(Ap^)J , 

where 

Sct^'y - \p 

= -2Tr [Co'C,^ Co-'C,;3 Co’^C,.] (g) 

+ ^Tr[Co-'C,^Co-^C,„;3] , 

and 

Qoip'yS —\p 

= 9 Tr [CpC,s CpC,^ CpC,p CpC,a] 

+ ^Tr [CpC,-,s CpC,ap] (10) 

- 12 Tr [CpC,-,5 CpC,p Co ^C,„] 
-f2Tr[Co^C,., CpC,aps\ 

which gives the Taylor series of the log-likelihood up to 
fourth order, after being averaged over the data. 

In reference to Eq. (8) multiple observations can be 
made. Firstly, this approximation will in general be un- 
normalizable since it will diverge somewhere in parameter 
space. This is partly due to the odd powers of Ap, which 
will clearly become negative on one side of the fiducial point 
(about which the expansion is made) if they are positive on 
the other side of the fiducial point; the argument of the ex¬ 
ponential function will then become positive even for small 
displacements from the best fit point, and the approximation 
will begin to diverge. Also the summation over even pow¬ 
ers of Ap can lead to divergences, as terms of the structure 
Ap\Ap\Ap\Ap 2 will appear, as has been detailed in Pa¬ 
per 1. These divergences of the approximation can only be 
avoided in an application-independent way if the argument 
of the exponential is negative everywhere in parameterspace. 
One way of achieving this is to demand the approximation 
to have the shape 

L « exp (—Q) , (11) 

where Q is a quadratic function of the parameters and there¬ 
fore always positive definite. The expansion Eq. (8) of the 
log-likelihood does not have this shape. 

Secondly, we observe that even if only first order deriva¬ 
tives of the covariance matrix were non-vanishing, the above 
series would still not terminate after the Fisher approxima¬ 
tion. The first lines of Eq. (9) and Eq. (10) contain only first 
order derivatives of the covariance matrix and make it clear 
that at the n-th Taylor order a term of the shape 

Tr[{Co-^C,aApaP] ( 12 ) 

appears, where we have expressed the repeated multiplica¬ 
tion of the same matrices as a power. As new information 
on the parameter dependence of the covariance matrix is en¬ 
coded in its higher order derivatives, the terms Eq. (12) do 
not add any of the new information which we target; they 
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simply stem from the slowly convergent Taylor series of the 
logarithm. 

Therefore we see that a Taylor approximation of the 
log-likelihood beyond second order is a valid but laborious 
way to include non-Gaussian behaviour: the log-likelihood 
would need to be approximated to much higher than the 
4th order, before it can be expected to be normalizable for a 
physical application. To avoid all of the above discussed dif¬ 
ficulties, we construct a likelihood approximation in which 
we explicitely request that it shall have the shape Eq. (11). 
This can be achieved by Taylor approximating the covari¬ 
ance matrix directly which has the further advantage that 
C depends on the parameters more sensitively than ln(C). 
Thus, Taylor expanding the covariance matrix will pick up 
the higher order derivatives earlier. This approximation is 
deduced in Sec. (4) and the convergence criterion for this 
approximation is given in Sec. (5) 


4 BEYOND GAUSSIANITY 


We express the variation of the covariance matrix over the 
parameter space by its Taylor series and single out the con¬ 
stant zeroth-order term 


C{p) = Co+T^c), (13) 

where Cq is the constant covariance matrix evaluated at the 
likelihood maximum, and 

^1 ^ (Pc -:Pa)...(Pn -Pn) (14) 


is the p-dimensional Taylor series of the covariance matrix, 
beginning at the first derivative The derivatives are cho¬ 
sen to be evaluated at the maximum of the likelihood, de¬ 
noted by p. This series carries information on how the co- 
variance matrix changes throughout the parameter space. 
Here, we are specifically interested in higher order deriva¬ 
tives of the covariance matrix, since these encode the non¬ 
linear dependence of the covariance matrix on parameters. 
For p, = 0 the data matrix is D = xx^ which is parameter 
independent. The log-likelihood is then given by 


C=\Tr[\n[C)] + \Tr[{xx^)C-^] 


1 

2 


Tr 


In (Co[l-bCo 


+ {xx^) {Co + T(c)) 



(15) 

where angular brackets denote averaging over the data and 
{xx'^) is kept explicitely, in order to emphasize that it does 
not depend on parameters, although it will later average out 
to be the measured covariance matrix. So far, the covariance 
matrix has only been rewritten, but no approximation has 
been made. 

However, if the Taylor series T^c) i® evaluated only suf¬ 
ficiently close to the maximum likelihood point, then we 
will have T^c) ^ C'o and we can consistently approximate 
Eq. (15) up to second order in This leads to the tar¬ 
geted shape Eq. (11). We therefore approximate by apply¬ 
ing the matrix inversion identity (also known as Woodbury 
identity) 


(A + B)“^ = A“^ - (l-f BA"^) ^ BA-^ (16) 


to find an approximation for the inverted covariance matrix 


(Co + 7)'c)) ' = Co ' + Co^Ec) Co 


- Co^T^c) Co^T^c) Co"' + C(3), 


(17) 


where the approximation was truncated at second order 
since we target the shape Eq. (11). The quadratic term of 
the logarithm’s Taylor expansion is, 

ln(l-I-a;) = a;— —+0(x^). (18) 

The quadratic approximation of the log-likelihood then be¬ 
comes 


£«-Tr 


In(Co) + Co-'7^'c) - 7;Co^r(h) Co^T(h) 


+ ^Tr [{xx'^) (Cq-' - Co^Eh)Co^ 

+ Co^T(^c)Co"Ec)Co^ + 0(3))] . 

(19) 

Applying {xx'^) = Co the likelihood approximation simpli¬ 
fies to 


L « N exp{—JZ) 


= N exp jTr 
= N exp ^Tr 


[Co^T(h)Co^T(^c)]+Oi3)'^ 

Cq {C,c, ^Pa + ^C,al3 ^Pa^Pp + .•■) 

Cq {C lOt ^Pa + ^C ^Pa^PP + •..) 

( 20 ) 


where In(Co) and CoC((^ = 1 are constants and were ab¬ 
sorbed into the normalization constant N. In the last step, 
a repeated multiplication of the same terms appears. This 
can be rewritten as 


+ 0{3) 


L « 


= N exp I 


(Co\C,c.Ap, 




,p ApaApp + ...)^ 
( 21 ) 


where the repeated multiplication of the same matrices in 
the trace was made more explicit by denoting it as a square. 

We therefore have arrived at an approximation of the 
shape Eq. (11) that includes higher order derivatives of the 
covariance matrix. This approximation will consequently re¬ 
main normalizable everywhere in parameter space. This re¬ 
sult generalizes the usual Fisher matrix in a straight forward 
way: if the Taylor-approximation T^c) i® truncated at first 
order, the usual Fisher matrix approximation Eq. ( 6 ) of the 
likelihood is obtained and the higher order corrections are 
then 


2 ' 


L « Aexp (-jTr [Cp ^C,c Co^C,p] Ap^App 
——Tr |^C*Q C7,a C'o ^ 5 ^ 7 ] 

— Y^Tr I^C^o C tap CQ ^* 575 ] Apo: Ap^ Ap-y Ap5 
— ^Tr [Cq ^C',a/3 Cq ApaAppAp^ApsApe 

-j^Tr \cpc,ap-i Co~^C,seP\ ApaAppAp^ApsAp^App^ 

( 22 ) 

where we have chosen to truncate the Taylor expansion of 
the covariance matrix at third order for brevity; the contin¬ 
uation to fourth and higher orders of the covariance matrix 
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is however obvious from Eq. (21). The terms that are cubic 
and quintic in the Ap can become negative and thereby de¬ 
crease the likelihood estimate in regions, where it had been 
overerstimated by the even-order terms. In total however, 
the terms combine to a quadratic form, and thereby the ap¬ 
proximation is known to not diverge anywhere in parameter 
space. 

As this result generalizes the findings of Paper 1, and is 
also based on a derivative expansion (this time of the covari¬ 
ance matrix), we stick with the name DALI (Derivative Ap¬ 
proximation for Likelihoods). If this approximate likelihood 
shall be updated to a posterior distribution, multiplication 
by a prior can be achieved by adding the log-likelihood of 
the prior to the DALI-approximation, just as in case of the 
Fisher matrix approximation. Details about the expected 
speed-up when compared to MCMC can be found in Paper 
1 . 


5 CRITERIA OF APPLICABILITY 

Non-Gaussianity can arise from at least two sources. For 
example if the data has only little constraining power then 
even the likelihood for a model with only mildly non-linear 
parameters will pick up non-Gaussianities. In contrast, if the 
data is very constraining non-Gaussianites will still occur 
if parameters are degenerate with each other over a finite 
range. In this case the non-Gaussianities can be recovered 
by DALI. 

The approximation of Eq. (21) is strictly valid if the 
following criteria are fulfilled: 

• The data set x must be so constraining that the like¬ 
lihood is confined to within a region Ap where the second 
order Taylor approximations Eq. (18) and Eq. (17) dominate 
over their higher orders. 

• Approximating the log in Eq. (19) requires 

Tr [Co^Lc)] « 1 (23) 

which can be solved for parameter offsets Ap 

Apc < -p--- ^ --y (24) 

The last requirement will be fulfilled if the data set confines 
the preferred parameter space to an area within which the 
Taylor-approximation captures well the variation of the co- 
variance matrix throughout the parameter space. DALI is 
therefore expected to work well in case of rather constrain¬ 
ing data and degenerate parameters, while a good recovery 
of non-Gaussianities for weakly constraining data and mild 
non-linear dependences on the parameters would require 
Taylor-approximating the log-likelihood to much higher or¬ 
ders with the corresponding difficulties detailed in Sec. (3). 
If the condition Eq. (24) is only marginally fulfilled, the 
DALI-approximations will still converge although they will 
not pick up all the shape-information of the likelihood. Mis¬ 
matches between the shape of the approximation and the 
real likelihood shape will then be observed. This is already 
known from the Fisher matrix, and expected to be more 
mild in DALI since the higher order derivatives will correct 
upon the Fisher matrix misestimates. 


6 PARAMETER DEPENDENT COVARIANCE 
MATRIX AND MEAN 

The DALI formalism described above is able to recover non- 
Gaussian likelihood shapes if the covariance matrix depends 
on parameters. In the previous paper Sellentin et al. (2014), 
a non-Gaussian likelihood approximation was developed for 
the case when the covariance matrix is constant, and only 
the mean fi depends on parameters. An interesting ques¬ 
tion is whether the two approxiations can be combined to 
approximate a likelihood where both mean and covariance 
matrix depend on data. Multiple interesting aspects should 
be pointed out in this context. One expects a likelihood 
approximation to fulfill three criteria: it shall be positive- 
definite, normalizable, and additionally possess a high de¬ 
gree of shape fidelity, i.e. quickly converge towards the shape 
of the unapproximated likelihood. Positive definiteness and 
normalizability are guaranteed by DALI being of the shape 
L « exp(—Q), with Q being a positive definite form in the 
parameters. In principle, any positive definite form could 
be chosen. However, our choice of using the squared Taylor 
series of either /x or C additionally guarantees the shape 
fidelity of the DALI expansion. If the squared Taylor series 
were replaced by another quadratic form, the shape fidelity 
would most likely be quickly lost. If both, fi and C depend 
on the same parameters, no quadratic form that is at the 
same time a Taylor series has been found so far due to the 
appearance of crossterms between derivatives of /x and C, 
e.g. /x,a C,p Neglecting these crossterms will produce a 
DALI-expansion that is a simple multiplication of Eq. (21) 
and Eq. (16) from Sellentin et al. (2014). This may be a good 
approximation in many cases, e.g. when the covariance ma¬ 
trix depends strongly on some parameters but not on those 
on which the mean depends. However, due to omitting the 
crossterms, in general this expansion will not be able to re¬ 
cover all information and therefore it may not yield a good 
approximation. 


7 TESTCASES 

The strength of this new approximation scheme was tested 
on two toy-models of particularly severe non-Gaussianities 
which arise from degeneracies. Both toy models are two- 
dimensional and have /x = 0. The data set consists of 50 
data points. The covariance matrix of the first is diagonal 
and given by 

Cij{p) = {pl+pl)Sij, (25) 

with the Kronecker-Delta dij. Since p\+p% = lis the equa¬ 
tion of a circle, this model produces a ring-shaped unap¬ 
proximated likelihood, with the interior of the ring being a 
region of zero likelihood. All points which lie exactly on the 
circle will maximize the likelihood and any of them could 
be chosen as fiducial point for evaluating the DALI approx¬ 
imation. Taking more than 50 data points would decrease 
the thickness of the ring but would never be able to lift the 
degeneracy, even for an infinite number of measurements. 
Such likelihoods appear for example in particle physics for 
measurements of the Gabibbo-Kobayashi-Maskawa matrix 
(Gharles et al. 2015). 
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Figure 1. The unapproximated likelihood of Eq. (25) is depicted in grey. Since Eq. (25) is the equation of a circle, the likelihood has 
a ring-shape. Left: The Fisher approximation in blue, with fiducial point indicated by the small blue dot. The Fisher matrix is singular 
and therefore apprears as a set of parallel lines. Right: in blue the DALI approximation using second order derivatives of the covariance 
matrix Eq. (25). 





Figure 2. Like Fig.(l) only for a likelihood using the covariance matrix Eq. (26). The likelihood is again depicted by the empty grey 
ring, and the different approximations are depicted in blue: Fisher matrix (left), DALI with second order derivatives of the covariance 
matrix (middle), DALI with third order derivatives (right). 


The covariance matrix of the second toy model is, 

Ci3ip) = iPi+P2)Sij, (26) 

which again possesses a closed degeneracy line of a somewhat 
boxy ring-shape. Again, each point along the line pt+pi = 1 
can serve as fiducial point for the DALI approximation. The 
unapproximated likelihoods of these two models are depicted 
as grey shades in Fig.(l-2), where the two shades indicate 
the 68% and 95% confidence contours. Both toy models were 
then approximated by Eq. (21), truncated at different or¬ 
ders. The Fisher matrix of both cases is degenerate and 
appears as parallel non-closing lines. Changing the evalu¬ 
ation of the derivatives cannot break this degeneracy. The 
second-order DALI-approximation already finds the full cir¬ 
cle, since no higher than second order derivatives exist in 
this case. For the second toy-model, a complete recovery of 
the likelihood would require the calculation of fourth order 
derivatives. Although this could be done analytically in the 


case at hand, in general such a calculation would need a 
numerical solution. We therefore maintain the truncation of 
the expansion Eq. (21) at third order, as implemented in 
the public code DALI. The resulting approximation can be 
seen in Fig. (2, right) for third order derivatives, or second 
order derivatives in Fig. (2, middle). The degeneracy of the 
Fisher matrix is lifted in both cases, and the improvement 
in shape fidelity can easily be seen. As typical applications 
of this method would not posess such strong parameter de¬ 
generacies, it can be expected that the DALI-method will 
reconstruct the likelihood contours with great accuracy. 

The DALI-code that combines the specialized likelihood 
approximations of our previous paper, and the extension 
presented here, is public at DALI. However, due to the struc¬ 
tural similarity with the Fisher matrix, any already exist¬ 
ing Fisher code can easily be upgraded to a DALI-code by 
adding the higher order derivatives of Eq. (21). 
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